22 research outputs found
Interpretable statistics for complex modelling: quantile and topological learning
As the complexity of our data increased exponentially in the last decades, so has our
need for interpretable features. This thesis revolves around two paradigms to approach
this quest for insights.
In the first part we focus on parametric models, where the problem of interpretability
can be seen as a “parametrization selection”. We introduce a quantile-centric
parametrization and we show the advantages of our proposal in the context of regression,
where it allows to bridge the gap between classical generalized linear (mixed)
models and increasingly popular quantile methods.
The second part of the thesis, concerned with topological learning, tackles the
problem from a non-parametric perspective. As topology can be thought of as a way
of characterizing data in terms of their connectivity structure, it allows to represent
complex and possibly high dimensional through few features, such as the number of
connected components, loops and voids. We illustrate how the emerging branch of
statistics devoted to recovering topological structures in the data, Topological Data
Analysis, can be exploited both for exploratory and inferential purposes with a special
emphasis on kernels that preserve the topological information in the data.
Finally, we show with an application how these two approaches can borrow strength
from one another in the identification and description of brain activity through fMRI
data from the ABIDE project
Topological summaries for Time-Varying Data
Topology has proven to be a useful tool in the current quest for ”insights on the data”, since it characterises objects through their connectivity structure, in an easy and interpretable way. More specifically, the new, but growing, field of TDA (Topological Data Analysis) deals with Persistent Homology, a multiscale version of Homology Groups summarized by the Persistence Diagram and its functional representations (Persistence Landscapes, Silhouettes etc). All of these objects, how- ever, are designed and work only for static point clouds. We define a new topological summary, the Landscape Surface, that takes into account the changes in the topology of a dynamical point cloud such as a (possibly very high dimensional) time series. We prove its continuity and its stability and, finally, we sketch a simple example
Supervised Learning with Indefinite Topological Kernels
Topological Data Analysis (TDA) is a recent and growing branch of statistics
devoted to the study of the shape of the data. In this work we investigate the
predictive power of TDA in the context of supervised learning. Since
topological summaries, most noticeably the Persistence Diagram, are typically
defined in complex spaces, we adopt a kernel approach to translate them into
more familiar vector spaces. We define a topological exponential kernel, we
characterize it, and we show that, despite not being positive semi-definite, it
can be successfully used in regression and classification tasks
Persistence Flamelets: multiscale Persistent Homology for kernel density exploration
In recent years there has been noticeable interest in the study of the "shape
of data". Among the many ways a "shape" could be defined, topology is the most
general one, as it describes an object in terms of its connectivity structure:
connected components (topological features of dimension 0), cycles (features of
dimension 1) and so on. There is a growing number of techniques, generally
denoted as Topological Data Analysis, aimed at estimating topological
invariants of a fixed object; when we allow this object to change, however,
little has been done to investigate the evolution in its topology. In this work
we define the Persistence Flamelets, a multiscale version of one of the most
popular tool in TDA, the Persistence Landscape. We examine its theoretical
properties and we show how it could be used to gain insights on KDEs bandwidth
parameter
Improving local prevalence estimates of SARS-CoV-2 infections using a causal debiasing framework.
Funder: Oxford University | Jesus College, University of OxfordFunder: Joint Biosecurity CentreGlobal and national surveillance of SARS-CoV-2 epidemiology is mostly based on targeted schemes focused on testing individuals with symptoms. These tested groups are often unrepresentative of the wider population and exhibit test positivity rates that are biased upwards compared with the true population prevalence. Such data are routinely used to infer infection prevalence and the effective reproduction number, Rt, which affects public health policy. Here, we describe a causal framework that provides debiased fine-scale spatiotemporal estimates by combining targeted test counts with data from a randomized surveillance study in the United Kingdom called REACT. Our probabilistic model includes a bias parameter that captures the increased probability of an infected individual being tested, relative to a non-infected individual, and transforms observed test counts to debiased estimates of the true underlying local prevalence and Rt. We validated our approach on held-out REACT data over a 7-month period. Furthermore, our local estimates of Rt are indicative of 1-week- and 2-week-ahead changes in SARS-CoV-2-positive case numbers. We also observed increases in estimated local prevalence and Rt that reflect the spread of the Alpha and Delta variants. Our results illustrate how randomized surveys can augment targeted testing to improve statistical accuracy in monitoring the spread of emerging and ongoing infectious disease
Recommended from our members
Improving local prevalence estimates of SARS-CoV-2 infections using a causal debiasing framework.
Funder: Oxford University | Jesus College, University of OxfordFunder: Joint Biosecurity CentreGlobal and national surveillance of SARS-CoV-2 epidemiology is mostly based on targeted schemes focused on testing individuals with symptoms. These tested groups are often unrepresentative of the wider population and exhibit test positivity rates that are biased upwards compared with the true population prevalence. Such data are routinely used to infer infection prevalence and the effective reproduction number, Rt, which affects public health policy. Here, we describe a causal framework that provides debiased fine-scale spatiotemporal estimates by combining targeted test counts with data from a randomized surveillance study in the United Kingdom called REACT. Our probabilistic model includes a bias parameter that captures the increased probability of an infected individual being tested, relative to a non-infected individual, and transforms observed test counts to debiased estimates of the true underlying local prevalence and Rt. We validated our approach on held-out REACT data over a 7-month period. Furthermore, our local estimates of Rt are indicative of 1-week- and 2-week-ahead changes in SARS-CoV-2-positive case numbers. We also observed increases in estimated local prevalence and Rt that reflect the spread of the Alpha and Delta variants. Our results illustrate how randomized surveys can augment targeted testing to improve statistical accuracy in monitoring the spread of emerging and ongoing infectious disease
Interoperability of Statistical Models in Pandemic Preparedness: Principles and Reality
We present "interoperability" as a guiding framework for statistical
modelling to assist policy makers asking multiple questions using diverse
datasets in the face of an evolving pandemic response. Interoperability
provides an important set of principles for future pandemic preparedness,
through the joint design and deployment of adaptable systems of statistical
models for disease surveillance using probabilistic reasoning. We illustrate
this through case studies for inferring spatial-temporal coronavirus disease
2019 (COVID-19) prevalence and reproduction numbers in England
Persistence Flamelets: Topological Invariants for Scale Spaces
In recent years there has been noticeable interest in the study of the “shape of data”. Among the many ways a “shape” could be defined, topology is the most general one, as it describes an object in terms of its connectivity structure: connected components (topological features of dimension 0), cycles (features of dimension 1) and so on. There is a growing number of techniques, generally denoted as Topological Data Analysis, or TDA for short, aimed at estimating topological invariants of a fixed object; when we allow this object to change, however, little has been done to investigate the evolution in its topology. In this work we define the Persistence Flamelet, a multiscale version of one of the most popular tool in TDA, the Persistence Landscape. We examine its theoretical properties and we show its performance as both an exploratory and inferential tool. In addition, we provide open source implementation of the objects and methods presented in the R-package pflamelet